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1.1 Abstract 

We propose a test based on the theory of algorithmic complexity and an ex- 
perimental evaluation of Levin's universal distribution to identify evidence 
in support of or in contravention of the claim that the world is algorith- 
mic in nature. To this end we have undertaken a statistical comparison of 
the frequency distributions of data from physical sources on the one hand- 
repositories of information such as images, data stored in a hard drive, 
computer programs and DNA sequences-and the frequency distributions 
generated by purely algorithmic means on the other-by running abstract 
computing devices such as Turing machines, cellular automata and Post 
Tag systems. Statistical correlations were found and their significance mea- 
sured. 



1.2 Introduction 

A statistical comparison has been undertaken of the frequency distributions 
of data stored in physical repositories on the one hand-DNA sequences, 
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2 Information and Computation 

images, files in a hard drive-and of frequency distributions produced by 
purely algorithmic means on the other-by running abstract computational 
devices like Turing machines, cellular automata and Post Tag systems. 

A standard statistical measure is adopted for this purpose. The Spear- 
man rank correlation coefficient quantifies the strength of the relationship 
between two variables, without making any prior assumption as to the par- 
ticular nature of the relationship between them. 



1.2.1 Levin's Universal Distribution 

Consider an unknown operation generating a binary string of length k bits. 
If the method is uniformly random, the probability of finding a particular 
string s is exactly 2~ k , the same as for any other string of length k. How- 
ever, data is usually produced not at random but by a process. There is 
a measure which describes the expected output frequency distribution of 
an abstract machine running a program. A process that produces a string 
s with a program p when executed on a universal Turing machine T has 
probability m(s). As p is itself a binary string, m(s) can be defined as 
being the probability that the output of a universal prefix Turing machine 
T is s when provided with a sequence of fair coin flip inputs interpreted as 
a program. Formally, 



m(s) = S T ( p ) =s 2" 



-bl 



where the sum is over all halting programs p for which T outputs the 
string s, with \p\ the length of the program p. As T is a prefix universal 
Turing machine, the set of valid programs forms a prefix-free seld and thus 
the sum is bo u nded due to K r aft's i nequality. For technical details see 



Caludel d2002h : lLi and Vitanvil (l2008h: 



Downevl (|201ff )]. 

Formu lated by Leonid Levin |Levin ( 19841 )]. m has many remarkable 



roperties Kirchherr and Li J 1997|)]. I t is closely related to the concept of 



Chaitin 



1 2001)] in that the largest value of the sum 



pre 

algorithmic complexity 
of programs is dominated by the shortest one, so one can actually write 
m(s) as follows: 



m{s) = 2~^ s >+°« 

3 No element is a prefix of any other, a property necessary to keep < m(s) < 1 for all 
s and therefore a valid probability measure 
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In a world of computable processes, m(s) establishes that simple pat- 
terns which result from simple processes are likely, while complicated pat- 
terns produced by complicated processes (long programs) are relatively un- 
likely. 

It is worth noting that, unlike other probability measures, m is not only 
a probability distribution establishing that there are some objects that have 
a certain probability of occurring according to said distribution, it is also 
a distribution specifying the order of the particular elements in terms of 
their individual algorithmic complexity. 



1.3 The null hypothesis 

When looking at a large-enough set of data following a distribution, one 
can in statistical terms safely assume that the source generating the data 
is of the nature that the distribution suggests. Such is the case when a 
set of data follows, for example, a Gaussian distribution, where depending 
on certain statistical variables, one can say with a high degree of certitude 
that the process generating the data is of a random nature. 

When observing the world, the outcome of a physical phenomenon / 
can be seen as the result of a natural process P. One may ask how the 
probability distribution of a set of process of the type of P looks like. 

If one would like to know whether the world is algorithmic in nature 
one would need first to tell how an algorithmic world would look like. To 
accomplish this, we've conceived and performed a series of experiments to 
produce data by purely algorithmic means in order to compare sets of data 
produced by several physical sources. At the right level a simplification 
of the data sets into binary language seems always possible. Each obser- 
vation can measure one or more parameters (weight, location, etc.) of an 
enumeration of independent distinguishable values, a discrete sequence of 
valuefl 

If there is no bias in the sampling method or the generating process 
itself and no informat i on ab out the process is known, the principle of 
indifference [Thompson! (j2003l) J^I states that if there are n > 1 possibili- 



ties mutually exclusive, collectively exhaustive and only distinguishable for 



4 This might be seen as an oversimplification of the concept of a natural process and of its 
outcome when seen as a binary sequence, but the performance of a physical experiment 
always yields data written as a sequence of individual observations as a valid sample of 
certain phenomena. 

5 also known as principle of insufficient reason. 



August 12, 2010 0:33 World Scientific Book - 9in x 6in OnTheAlgorithmicNatureOfTheWorld 
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their names then each possibility should be assigned a probability equal 
to 1/n as the simplest non-informative prior. The null hypothesis to test 
is that the frequency distributions studied herein from several different 
independent sources are closer to the experimental calculation of Levin's 
universal distribution than to the uniform (simplest non-informative prior) 
distribution. To this end average output frequency distributions by running 
abstract computing devices such as cellular automata, Post tag systems and 
Turing machines were produced on the one hand, and by collecting data to 
build distributions of the same type from the physical world on the other. 

1.3.1 Frequency distributions 

The distribution of a variable is a description of the relative number of 
times each possible outcome occurs in a number of trials. One of the most 
common probability distributions describing physical events is the normal 
distribution, also known as the Gaussian or Bell curve distribution, with 
values more likely to occur due to small random variations around a mean. 

There is also a particular scientific interest in power-law distributions, 
partly from the ease with which certain general classes of mechanisms gen- 
erate them. The demonstration of a power-law relation in some data can 
point to specific kinds of mechanisms that might underlie the natural phe- 
nomenon in question, and can indicate a connection with other, seemingly 
unrelated systems. 

As explained however, when no information is available, the simplest 
distribution is the uniform distribution, in which values are equally likely 
to occur. In a macroscopic system at least, it must be assumed that the 
physical laws which govern the system are not known well enough to predict 
the outcome. If one does not have any reason to choose a specific distribu- 
tion and no prior information is available, the uniform distribution is the 
one making no assumptions according to the principle of indifference. This 
is supposed to be the distribution of a balanced coin, an unbiased die or a 
casino roulette where the probability of an outcome h is 1/n if ki can take 
one of n possible different outcomes. 

1.3.2 Computing abstract machines 

An abstract machine consists of a definition in terms of input, output, and 
the set of allowable operations used to turn the input into the output. They 
are of course algorithmic by nature (or by definition). Three of the most 
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popular models of computation in the field of theoretical computer science 
were resorted to produce data of a purely algorithmic nature: these were 
deterministic Turing machines (denoted by TM), one-dimensional cellular 
automata (denoted by CA) and Post Tag systems (TS). 

The Turing machine model represents the basic framework underlying 
many concepts in computer science, including the definition of algorithmic 
complexity cited above. The cellular automaton is a well-known model 
which, together with the Post Tag system model, has been studied since 
the foundation of the field of abstract computation by some of its first 
pioneers. All three models are T uring-complete. The descriptions of the 
models follow formalisms used in (Wolframl (Eooi) ]. 



1.3.2.1 Deterministic Turing machines 

The Turing machine description consists of a list of rules (a finite program) 
capable of manipulating a linear list of cells, called the tape, using an access 
pointer called the head. The finite program can be in any one of a finite set 
of states Q numbered from 1 to n, with 1 the state at which the machine 
starts its computation. Each tape cell can contain or 1 (there is no special 
blank symbol). Time is discrete and the steps are ordered from to t with 
the time at which the machine starts its computation. At any given time, 
the head is positioned over a particular cell and the finite program starts 
in the state 1. At time all cells contain the same symbol, either or 1. 
A rule i can be written in a 5-tuple notation as follows {si, fcj, s[, k[, di}, 
where Si is the tape symbol the machine's head is scanning at time t, ki 
the machine's current 'state' (instruction) at time t, s[ a unique symbol to 
write (the machine can overwrite a 1 on a 0, a on a 1, a 1 on a 1, or a 
on a 0) at time t + 1, k\ a state to transition into (which may be the same 
as the one it was already in) at time t + 1, and di a direction to move in 
time t + 1, either to the right (R) cell or to the left (L) cell, after writing. 
Based on a set of rules of this type, usually called a transition table, a 
Turing machine can perform the following operations: 1. write an element 
from A — {0, 1}, 2. shift the head one cell to the left or right, 3. change 
the state of the finite program out of Q. When the machine is running it 
executes the above operations at the rate of one operation per step. At a 
time t the Turing machine produces an output described by the contiguous 
cells in the tape visited by the head. 

Let T(0), T(l), . . . , T(n), ... be a natural recursive enumeration of all 
2-symbol deterministic Turing machines. One can, for instance, begin 
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enumerating by number of states, starting with all 2-state Turing ma- 
chines, then 3-state, and so on. Let n, t and k be three integers. Let 
s(T(n),t) be the part of the contiguous tape cells that the head visited 
after t steps. Let's consider all the fc-tuples, i.e. all the substrings of length 
k from s(T(n),t) — {si, s 2 , . . . , s u }, i.e. the following u — k + 1 fc-tuples: 

{(Si, . . . , Sfc), (s 2 , . . . , Sfe+l), • • • , (Su-k+l, • • • , s u )}. 

Now let N be a fixed integer. Let's consider the set of all the fc-tuples 
produced by the first N Turing machines according to a recursive enumer- 
ation after running for t steps each. Let's take the count of each /c-tuple 
produced. 

From the count of all the /c-tuples, listing all distinct strings together 
with their frequencies of occurrence, one gets a probability distribution over 
the finite set of strings in {0, l} fe . 

For the Turing machines the experiments were carried out with 2-symbol 
3-state Turing machines. There are (4n) 2 ™ possible different n-state 2 sym- 
bol Turing machines according to the 5-tuple rule description cited above. 
Therefore (4 x 3) (2x3) = 2985984 2-symbol 3-state Turing machines. A 
sample of 2000 2-symbol 3-state Turing machines was taken. Each Turing 
machine's runtime was set to t = 100 steps starting with a tape filled with 
0s and then once again with a tape filled with Is in order to avoid any 
undesired asymmetry due to a particular initial set up. 

1.3.2.2 One- dimensional Cellular Automata 

An analogous standard description of one-dimensional 2-color cellular au- 
tomata was followed. A one-dimensional cellular automaton is a collection 
of cells on a row that evolves through discrete time according to a set of 
rules based on the states of neighboring cells that are applied in parallel to 
each row over time. When the cellular automaton starts its computation, 
it applies the rules at a first step t = 0. If m is an integer, a neighborhood 
of m refers to the cells on both sides, together with the central cell, that 
the rule takes into consideration at row t to determine the value of a cell 
at the step t+ I. If m is a fraction of the form p/q, then p — 1 are the cells 
to the left and q — 1 the cells to the right taken into consideration by the 
rules of the cellular automaton. 

For cellular automata, the experiments were carried out with 3/2-range 
neighbor cellular automata starting from a single 1 on a background of 0s 
and then again starting from a single on a background of Is to avoid 
any undesired asymmetry from the initial set up. There are 2 2m+1 possible 
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states for the cells neighboring a given cell (m at each side plus the central 
cell), and two possible outcomes for the new cell; there are therefore a 
total of 2 22m+1 one-dimensional m-neighbor 2-color cellular automata, hence 

(2 X 3 / ~ ) | 1 

2 2 = 65536 cellular automata with rules taking two neighbors to 

the left and one to the right. A sample of 2000 3/2-range neighbor cellular 
automata was taken. 

As for Turing machines, let A(l), A(2), . . . , A(n), ... be a natural recur- 
sive enumeration of one dimensional 2-color cellular automata. For exam- 
ple, one can start enumerating them by neighborhood starting from range 
1 (nearest- neighbor) to 3/2-neighbor, to 2-neighbor and so on, but this is 
not mandatory. 

Let n, t and k be three integers. For each cellular automaton A(n), 
let s(A(n),t) denote the output of the cellular automata defined as the 
contiguous cells of the last row produced after t = 100 steps starting from a 
single black or white cell as described above, up to the length that the scope 
of the application of the rules starting from the initial configuration may 
have reached (usually the last row of the characteristic cone produced by a 
cellular automaton). As was done for Turing machines, tuples of length k 
were extracted from the output. 

1.3.2.3 Post Tag Systems 

A Tag system is a triplet (m,A,P), where m is a positive integer, called 
the deletion number. A is the alphabet of symbols (in this paper a binary 
alphabet). Finite (possibly empty) strings can be made of the alphabet A. 
A computation by a Tag system is a finite sequence of strings produced by 
iterating a transformation, starting with an initially given initial string at 
time t = 0. At each iteration m elements are removed from the beginning 
of the sequence and a set of elements determined by the production rule P 
is appended onto the end, based on the elements that were removed from 
the beginning. Since there is no generalized standard enumeration of Tag 
systema_j, a random set of rules was generated, each rule having equal prob- 
ability. Rules are bound by the number of r elements (digits) on the left and 
right hand blocks of the rule. There are a total of (k r + 1 — l)/(fc — 1) 
possible rules if blocks up to length r can be added at each step. For r = 3, 
there are 50625 different 2-symbol Tag systems with deletion number 2. In 
this experiment, a sample of 2000 2-Tag systems (Tag systems with dele- 
tion number 2) were used to generate the frequency distributions of Tag 

6 To the authors' knowledge. 
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systems to compare with. 

An example of a rule is {0 ->■ 10, 1 ->■ 011, 00 -> e, 01 -> 10, 10 -> 11}, 
where no term on any side has more than 3 digits and there is no fixed 
number of elements other than that imposed to avoid multiple assignations 
of a string to several different, i.e. ambiguous, rules. The empty string 
e can only occur among the right hand terms of the rules. The random 
generation of a set of rules yields results equivalent to those obtained by 
following and exhausting a natural recursive enumeration. 

As an illustration^), assume that the output of the first 4 Tur- 
ing machines following an enumeration yields the output strings 01010, 
11111, 11111 and 01 after running t = 100 steps. If k = 3, 
the tuples of length 3 from the output of these Turing machines 
are: ((010, 101, 010), (111, 111, 111), (111, 111, 111)); or grouped and sorted 
from higher to lower frequency: (111,010,101) with frequency val- 
ues 6, 2, and 1 respectively. The frequency distribution is therefore 
((111, 2/3), (010, 2/9), (101, 1/9)), i.e. the string followed by the count di- 
vided by the total. If the strings have the same frequency value they are 
lexicographically sorted. 



Probability distributions k = 5 
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Fig. 1.1 The output frequency distributions from running abstract computing machines. 
The x-axis shows all the 2 k tuples of length k sorted from most to least frequent. The 
y-axis shows the frequency values (probability between and 1) of each tuple on the 
x-axis. 



The output frequency distributions produced by abstract machines as 
described above are evidently algorithmic by nature (or by definition) , and 
they will be used both to be compared one to each other, and to the dis- 
tributions extracted from the physical real world. 



For illustration only no actual enumeration was followed. 
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1.3.3 Physical sources 

Samples from physical sources such as DNA sequences, random images 
from the web and data stored in a hard drive were taken and trans- 
formed into data of the same type (i.e. binary tuples) of the produced 
by the abstract computing machines. We proceeded as follows: a repre- 
sentative set of random images was taken from the web using the ran- 
dom image function available online at the Wikipedia Creative Commons 
website at http://commons.wikirnedia.Org/wiki/Special:Random/File (as of 
May, 2009), none of them larger than 1500 linear pixel^l to avoid any 
bias due to very large images. All images were transformed using the 
Mathematica function Binarize that converts multichannel and color im- 
ages into black-and-white images by replacing all values above a globally 
determined threshold. Then all the fc-tuples for each row of the image 
were taken and counted, just as if they had been produced by the abstract 
machines from the previous section. 

Another source of physical information for comparative purposes was 
a random selection of human gene sequences of Deoxyribonucleic acid (or 
simply DNA). The DNA was extracted from a random sample of 100 dif- 
ferent genes (the actual selection is posted in the project website cited in 
section 1 1 . 6j) . 

There are four possible encodings for translating a DNA sequence into 
a binary string using a single bit for each letter: {G — > 1, T — > 1, C — > 

0, A ->• 0}, {G ->■ 0, T ->• 1, C -> 0, A ->• 1}, {G ->• 1, T ->• 0, C -> 

1, A ->■ 0}, {G -> 0, T -> 0, C ->• 1, A -> 1}. 

To avoid any artificially induced asymmetries due to the choice of a 
particular encoding, all four encodings were applied to the same sample 
to build a joint frequency distribution. All the fc-tuples were counted and 
ranked likewise. 

There might be still some biases by sampling genes rather than sampling 
DNA segments because genes might be seen as conceived by researchers to 
focus on functional segments of the DNA. We've done however the same ex- 
periments taking only a sample of a sample of genes, which is not a gene by 
itself but a legitimate sample of the DNA, producing the same results (i.e. 
the distributions remain stable). Yet, the finding of a higher algorithmicity 
when taking gene samples as opposed to DNA general sampling might sug- 
gest that effectively there is an embedded encoding for genes in the DNA 
as functional subprograms in it, and not a mere research convenience. 



8 The sum of the width and height. 
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A third source of information from the real world was a sample of data 
contained in a hard drive. A list of all the files contained in the hard 
drive was generated using a script, and a sample of 100 files was taken 
for comparison, with none of the files being greater than 1 Mb in order to 
avoid any bias due to a very large file. The stream was likewise cut into 
k-tuples, counted and ranked to produce the frequency distribution, as for 
DNA. The count of each of the sources yielded a frequency distribution of 
/c-tuples (the binary strings of length k) to compare with. 

One may think that data stored in a hard drive already has a strong 
algorithmic component by the way that it has been produced (or stored 
in a digital computer) and therefore it makes no or less sense to compare 
with to any of the algorithmic or empirical distributions. It is true that the 
data stored in a hard drive is in the middle of what we may consider the 
abstract and the physical worlds, which makes it however interesting as an 
experiment by its own from our point of view. But more important, data 
stored in a hard drive is of very different nature, from text files subject to the 
rules of language, to executable programs, to music and video, all together 
in a single repository. Hence, it is not obvious at all why a frequency 
distribution from such a rich source of different kind of data might end up 
resembling to other distributions produced by other physical sources or by 
abstract machines. 



1.3.4 Hypothesis testing 

The frequency distributions generated by the different sources were statis- 
tically compared to look for any possible correlation. A correlation test 
was carried out and its significance measured to validate either the null 
hypothesis or the alternative (the latter being that the similarities are due 
to chance). 

Each frequency distribution is the result of the count of the number of 
occurrences of the k-tuples from which the binary strings of length k were 
extracted. Comparisons were made with k set from 4 to 7. 



1.3.4.1 Spearman's rank correlation coefficient 



The Spearman rank correlation coefficient [Snedecor and Cochran! (|1989T )] is 
a non-parametric measure of correlation that makes no assumptions about 
the frequency distribution of the variables. Spearman's rank correlation 
coefficient is equivalent to the Pearson correlation on ranks. Spearman's 
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rank correlation coefficient is usually denoted by the Greek letter p. 
The Spearman rank correlation coefficient is calculated as follows: 



p=l-((6j2d 2 i )/(n(n 2 -l))) 

where di is the difference between each rank of corresponding values of x 
and y, and n the number of pairs of values. 

Spearman's rank correlation coefficient can take real values from -1 to 
1, where -1 is a perfect negative (inverse) correlation, is no correlation 
and 1 is a perfect positive correlation. 

The approach to testing whether an observed p value is significantly 
different from zero, considering the number of elements, is to calculate 
the probability that it would be greater than o r equal to th e observed p 
given the null hypothesis using a permutation test Good! ( 20051 )] to ascertain 



that the obtained value of p obtained is unlikely to occur by chance (the 
alternative hypothesis). The common convention is that if the value of 
p is between O.Of and O.OOf the correlation is strong enough, indicating 
that the probability of having found the correlation is very unlikely to be 
a matter of chance, since it would occur one time out of hundred (if closer 
to 0.01) or a thousand (if closer to 0.001), while if it is between 0.10 and 
0.01 the correlation is said to be weak, although yet quite unlikely to occur 
by chance, since it would occur one time out of ten (if closer to 0.10) or a 
hundred (if closer to 0.01)@. The lower the significance level, the stronger 
the evidence in favor of the null hypothesis. Tables II .21 11.31 11.41 and 11.51 
show the Spearman coefficients between all the distributions for a given 
tuple length k. 

When graphically compared, the actual frequency values of each tuple 
among the 2 k unveil a correlation in values along different distributions. 
The x and y axes are in the same configuration as in the graph [TTD The 
x-axis plots the 2 fc tuples of length k but unlike the graph 11.11 they are 
lexicographically sorted (as the result of converting the binary string into 
a decimal number). The table fT~6l shows this lexicographical order as an 
illustration for k = 4. The y-axis plots the frequency value (probability 
between and 1) for each tuple on the x-axis. 



9 Useful tables with the calculation of levels of significance for different numbers of 
ranked elements are available online (e.g. at http://www.york.ac.uk/depts/ 
maths /histstat /tables/ 'spearman.ps as of May 2009). 
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Cellular automata distribution Hard drive distribution 
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Table 1.2 Spearman coefficients for K = 4. Coefficients 
indicating a significant correlation are indicated by f while 
correlations with higher significance are indicated with |. 
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1.3.5 The problem of overfitting 

When looking at a set of data following a distribution, one can safely claim 
in statistical terms that the source generating the data is of the nature that 
the distribution suggests. Such is the case when a set of data follows a 
model, where depending on certain variables, one can say with some degree 
of certitude that the process generating the data follows the model. 

However a common problem is the problem of overfitting, that is, a 
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Table 1.3 Spearman coefficients for K = 5. 
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Table 1.4 


Spearman coefficients for K = ( 
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Table 1.5 


Spearman coefficients for K = 7. 
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Table 1.6 Illustration of the simple lexico- 
graphical order of the 2 4 tuples of length k = 4 
as plotted in the x-axis. 
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false model that may fit perfectly with an observed phenomenon 10 !. Levin's 
universal distribution, however, is optimal over all distributions [?], in the 
sense that the algorithmic model is by itself the simplest possible model 

10 For example, Ptolemy's solar system model model. 
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Fig. 1.2 Frequency distributions of the tuples of length k from physical sources: bi- 
narized random files contained in a hard drive (HD). binarized sequences of Deoxyri- 
bonucleic acid (DNA) and binarized random images from the world wide web. The data 
points have been joined for clarity. 

fitting the data if produced by some algorithmic process. This is because 
m is precisely the result of a distribution assuming the most simple model in 
algorithmic complexity terms, in which the shortest programs produce the 
elements leading the distribution. That doesn't mean, however, that it must 
necessarily be the right or the only possible model explaining the nature 
of the data, but the model itself is ill suited to an excess of parameters 
argument. A statistical comparison cannot actually be used to categorically 
prove or disprove a difference or similarity, only to favor one hypothesis over 
another. 



1.4 Possible applications 

Common data compressors are of the entropy coding type. Two of the most 
popular entropy coding schemes are the Huffman coding and the arithmetic 
coding. Entropy coders encode a given set of symbols with the minimum 
number of bits required to represent them. These compression algorithms 
assign a unique prefix code to each unique symbol that occurs in the input, 
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Fig. 1.3 Frequency distributions of the tuples of length k from abstract computing 
machines: deterministic Turing machines (TM), one-dimensional cellular automata (CA) 
and Post Tag systems (TS). 



replacing each fixed-length input symbol by the corresponding variable- 
length prefix codeword. The length of each codeword is approximately 
proportional to the negative logarithm of the probability. Therefore, the 
most common symbols use the shortest codes. 

Another popular compression technique based on the same principle is 
the run-length encoding (RLE0, wherein large runs of consecutive identi- 
cal data values are replaced by a simple code with the data value and length 
of the run. This is an example of lossless data compression. However, none 
of these methods seem to follow any prior distributional, which means all 
of them are a posteriori techniques that after analyzing a particular image 
set their parameters to better compress it. A sort of prior compression dis- 
tributions may be found in the so-called dictionary coders, also sometimes 
known as substitution coders, which operate by searching for matches be- 
tween the text to be compressed and a set of strings contained in a static 
data structure. 

11 Implementations in different programming languages of the run-length encoding arc 
available at http://rosettacode.org/wiki/Run-length_encoding 

12 The authors were unable to find any reference to a general prior image compression 
distribution. 
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Fig. 1.4 Comparisons of all frequency distributions of tuples of length k, from physical 
sources and from abstract computing machines. 



In practice however, it is usually assumed that compressing an image 
is image dependent, i.e. different from image to image. This is true when 
prior knowledge of the image is available, or there is enough time to spend 
in analyzing the file so that a different compression scheme can be set 
up and used every time. Effectively, compressors achieve greater rates 
because images have certain statistical properties which can be exploited. 
But what the experiments carried out here suggest for example is that a 
general optimal compressor for images based on the frequency distribution 
for images can be effectively devised and useful in cases when neither prior 
knowledge nor enough time to analyze the file is available. The distributions 
found, and tested to be stable could therefore be used for prior image 
compression techniques. The same sort of applications for other data sets 
can also be made, taking advantage of the kind of exhaustive calculations 
carried out in our experiments. 

The procedure also may suggest a measure of algorithmicity relative to 
a model of computation: a system is more or less algorithmic in nature if 
it is more or less closer to the average d istribution of an ab s tract model 
of computation. It has also been shown [Delahave and Zenil ( 2007 )] that 
the calculation of these distributions constitute an effective procedure for 
the numerical evaluation of the algorithmic complexity of short strings, 
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and a mean to provide stability to the definition-independent of additive 
constants-of algorithmic complexity, unfolding runtime. 

1.5 The meaning of algorithmic 

Perhaps it may be objected that we have been careless in our use of the 
term algorithmic, not saying exactly what we mean by it. Nevertheless, 
algorithmic means nothing other than what this paper has tried to convey 
by the stance we have taken over the course of its arguments. 

In our context, Algorithmic is the adjective given to a set of processes 
or rules capable of being effectively carried out by a computer in opposi- 
tion to a truly (indeterministic) random process (which is uncomputable) . 
Classical models of computational capture what an algorithm is but this 
paper (or what it implies) experimentally conveys the meaning of algorith- 
mic both in theory and in practice, attempting to align the two. On the 
one hand, we had the theoretical basis of algorithmic probability. On the 
other hand we had the empirical data. We had no way to compare one with 
the other because of the non-computability of Levin's distribution (which 
would allow us to evaluate the algorithmic probability of an event). We 
proceeded, however, by constructing an experimental algorithmic distribu- 
tion by running abstract computing machines (hence a purely algorithmic 
distribution) , which we then compared to the distribution of empirical data, 
finding several kinds of correlations with different degrees of significance. 
For us therefore, algorithmic means the exponential accumulation of pat- 
tern producing rules and the isolation of randomness producing rules. In 
other words, the accumulation of simple rules. 

Our definition of algorithmic is actually much stronger than the one 
directly opposing true randomness. Because in our context something is 
algorithmic if it follows an algorithmic distribution (e.g. the experimental 
distribution we calculated). One can therefore take this to be a measure of 
algorithmicity: the degree to which a data set approaches an experimen- 
tally produced algorithmic distribution (assumed to be Levin's distribu- 
tion). The closer it is to an algorithmic distribution the more algorithmic. 

So when we state that a process is algorithmic in nature, we mean that it 
is composed by simple and deterministic rules, rules producing patterns, as 
algorithmic probability theoretically predicts. We think this is true of the 
market too, despite its particular dynamics, just as it is true of empirical 

13 Albeit assuming the Church- Turing thesis. 
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data from other very different sources in the physical world that we have 
studied. 



1.6 Conclusions 

Our findings suggest that the information in the world might be the result 
of processes resembling processes carried out by computing machines. That 
does not necessarily imply a trivial reduction more than talking about al- 
gorithmic simple rules generating the data as opposed to random or truly 
complicated ones. Therefore we think that these correlations are mainly due 
to the following reason: that general physical processes are dominated by 
algorithmic simple rules. For example, processes inv olved in the replication 
and transmission of the DNA have been found [lH (1999)] to be concate- 



nation, union, reverse, complement, annealing and melting, all they very 
simple in nature. The same kind of simple rules may be the responsible 
of the rest of empirical data in spite of looking complicated or random. 
As opposed to simple rules one may think that nature might be perform- 
ing processes represented by complicated mathematical functions, such as 
partial differential equations or all kind of sophisticated functions and pos- 
sible algorithms. This suggests that the DNA carries a strong algorithmic 
component indicating that it has been developed as a result of algorith- 
mic processes over the time, layer after layer of accumulated simple rules 
applied over and over. 

So, if the distribution of a data set approaches a distribution produced 
by purely algorithmic machines rather than the uniform distribution, one 
may be persuaded within some degree of certainty, that the source of the 
data is of the same (algorithmic) nature just as one would accept a normal 
distribution as the footprint of a generating process of some random nature. 
The scenario described herein is the following: a collection of different dis- 
tributions produced by different data sets produced by unrelated sources 
share some properties captured in their frequency distributions, and a the- 
ory explaining the data (its regularities) has been presented in this paper. 

There has hitherto been no way to either verify or refute the 
information-theoretic notion, beyond the metaphor, of whether the uni- 
verse can be conceived as either the output of some computer pro- 
gram or as s ome sort of vas t digital comp u tation device as sugge s ted by 
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We think we've devised herein a valid statistical test independent of 
any bias toward either possibility. Some indications of correlations have 
been found having weak to strong significance. This is the case with dis- 
tributions from the chosen abstract devices, as well as with data from the 
chosen physical sources. Each by itself turned out to show several degrees 
of correlation. While the correlation between the two sets was partial, each 
distribution was correlated with at least one distribution produced by an 
abstract model of computation. In other words, the physical world turned 
out to be statistically similar in these terms to the simulated one. 
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